Use of Graph Database for the Integration of Heterogeneous Biological Data
نویسندگان
چکیده
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
منابع مشابه
An Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملDesigning Graphical Data Storage Model for Gene-Protein and Gene-Gene Interaction Networks
Graph is an expressive way to represent dynamic and complex relationships in highly connected data. In today’s highly connected world, general purpose graph databases are providing opportunities to experience benefits of semantically significant networks without investing on the graph infrastructure. Examples of prominent graph databases are: Neo4j, Titan and OrientDB etc. In biological OMICS l...
متن کاملRadio Frequency Identification (RFID): A Technology for Enhancing Computerized Maintenance System (CMMS)
Abstract While Computerized Maintenance Management System (CMMS) enables maintenance managers and supervisors to access information about equipment, manpower and maintenance policies, there is still a need to facilitate getting data/information into the backend database where it can be utilized by the organization as information to make decisions regarding the operation of the organization. Si...
متن کاملارائه راهکاری جهت تجمیع داده ها در سازمانها با استفاده از وب سرویس
Increasing the speed and reducing the use of resources in the data integration process has always been the goal of developers and researchers in the process of data integration. The purpose of this study is to provide a solution using metadata as well as web browsing to speed up the process, so as to improve resources such as memory. The proposed solution is implemented using the three-layer ar...
متن کاملJXP4BIGI: a generalized, Java XML-based approach for biological information gathering and integration
MOTIVATION In the post-genomic era, biologists interested in systems biology often need to import data from public databases and construct their own system-specific or subject-oriented databases to support their complex analysis and knowledge discovery. To facilitate the analysis and data processing, customized and centralized databases are often created by extracting and integrating heterogene...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 15 شماره
صفحات -
تاریخ انتشار 2017